Region and Keyword Extraction Based on Color Marking for Document Entry

نویسندگان

  • Mineo Shoman
  • Takashi Nishimura
  • Toshiya Kawauchi
چکیده

A method for extracting an article and its keywords from a notated document using color image processing is presented. In this method, the user selects a region of an article and its keywords and marks them in color on the printed document. The system extracts the article and recognizes the keywords. This paper describes, (1) colored area extraction and color effect removal by adaptive thresholding using pseudoHVC coordinates based on the "transparent coloration model" which based on the behavior of colored pixels in color space, (2) polygonal shape extraction which extracts the intended article from a handdrawn outline by a new and simple algorithm "recursive vertex search", (3) and some experimental results which show high accuracy for newspaper article extraction using colored pixels in color space. The color marks of article borders and keywords are identified and then removed by adaptive thresholding using the pseudo-HVC coordinates. Because the color marks are handdrawn they will not be completely accurate. We have developed a polygonal shape extraction algorithm that examines the marked outline and then determines which part of the text is to be extracted. This algorithm is not yet complete because it needs information on typical document architecture. When complete the system will extract the desired article and stored as an image in an electronic filing system.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Document Image Retrieval Based on Keyword Spotting Using Relevance Feedback

Keyword Spotting is a well-known method in document image retrieval. In this method, Search in document images is based on query word image. In this Paper, an approach for document image retrieval based on keyword spotting has been proposed. In proposed method, a framework using relevance feedback is presented. Relevance feedback, an interactive and efficient method is used in this paper to imp...

متن کامل

ASHRAM: Active Summarization and Markup

Typically, searching for information in a document collection amounts to refining a query and then scanning a large number of documents to determine their relevance. Active Summarization Having Related Active Markup (ASHRAM) is a facility for representing and automatically selecting, marking, and linking useful and/or salient items in a document, to make it easier for the user to determine the ...

متن کامل

Document Analysis And Classification Based On Passing Window

In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...

متن کامل

A Fuzzy Based Three Color Meter/Marker for Diffserv Networks (RESEARCH NOTE)

Differentiated Services (Diffserv) which was proposed by Internet Engineering Task Force (IETF), is a scalable and robust model for providing the end-to-end QoS. In the Diffserv networks, metering mechanisms are used to measure traffic stream. The single rate Three Color Meter (srTCM) [1],which was proposed by IETF,‏ meters an IP packet stream and marks its packets either green, yellow, or red....

متن کامل

Automatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation

Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1988